Wikidata Import 2023-05-15
Jump to navigation
Jump to search
Import
Import | |
---|---|
edit | |
state | ✅ |
url | https://wiki.bitplan.com/index.php/Wikidata_Import_2023-05-15 |
target | QLever |
start | 2023-05-15 |
end | |
days | |
os | Ubuntu 22.04.2 LTS |
cpu | Intel(R) Xeon(R) Gold 6326 CPU @ 2.90GHz |
ram | 256 |
triples | |
comment |
see Wikidata_Import_2023-01-24
Environment
qleverauto -e
needed software
docker → /usr/bin/docker ✅
top → /usr/bin/top ✅
df → /usr/bin/df ✅
jq → /usr/bin/jq ✅
lsb_release → /usr/bin/lsb_release ✅
free → /usr/bin/free ✅
operating system
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.2 LTS
Release: 22.04
Codename: jammy
docker version
Docker version 23.0.5, build bc4487a
memory
total used free shared buff/cache available
Mem: 251Gi 73Gi 1.4Gi 49Mi 176Gi 174Gi
Swap: 9Gi 9Gi 0B
diskspace
/dev/sdb3 429G 113G 294G 28% /
tmpfs 126G 0 126G 0% /dev/shm
/dev/sdb1 1.1G 6.1M 1.1G 1% /boot/efi
/dev/sda1 11T 1.8T 9.1T 17% /hd/eneco
soft ulimit for files
1048576
df /hd/mantax/
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/nvme0n1p1 6200797708 2554036424 3615488792 42% /hd/mantax
QLever control
https://github.com/ad-freiburg/qlever-control
mkdir qlever
cd qlever
git clone https://github.com/ad-freiburg/qlever-control
Cloning into 'qlever-control'...
remote: Enumerating objects: 426, done.
remote: Counting objects: 100% (266/266), done.
remote: Compressing objects: 100% (170/170), done.
remote: Total 426 (delta 108), reused 231 (delta 95), pack-reused 160
Receiving objects: 100% (426/426), 131.00 KiB | 585.00 KiB/s, done.
Resolving deltas: 100% (163/163), done.
setup wikidata
mkdir wikidata
cd wikidata/
. ../qlever-control/qlever wikidata
QLEVER CONFIG
Checking your PATH ...
Added the directory "/hd/mantax/qlever/qlever-control" to your PATH
Setting up bash autocompletion ...
Done, number of completions: 35
Creating new Qleverfile ...
Copied pre-configured Qleverfile for "wikidata" into current directory.
Setup is complete
Type qlever and use autocompletion to see which actions are available. Add a
"show" in the end to see what an action does without executing it (for example,
qlever index show). Edit your local Qleverfile to change settings. A typical
sequence of actions if you have used a preconfigured Qleverfile is:
qlever get-data
qlever index
qlever start
qlever example-query
get-data ~7h:30 min
nohup qlever get-data&
tail nohup.out
440650K ... 100% 6.46T=2m17s
2023-05-15 18:20:15 (3.13 MB/s) - ‘latest-lexemes.ttl.bz2’ saved [451229154/451229154]
FINISHED --2023-05-15 18:20:15--
Total wall clock time: 7h 30m 9s
Downloaded: 2 files, 95G in 7h 30m 9s (3.61 MB/s)
ls -l
-rw-rw-r-- 1 wf wf 101738463320 May 11 13:38 latest-all.ttl.bz2
-rw-rw-r-- 1 wf wf 451229154 May 13 01:33 latest-lexemes.ttl.bz2
index
update qlever docker image
docker pull adfreiburg/qlever
doindex
need to work around https://github.com/ad-freiburg/qlever-control/issues/15
for F in latest-lexemes.ttl.bz2 latest-all.ttl.bz2
do
bzcat $F | head -1000 | \grep ^@prefix
done | sort -u > wikidata-latest.prefix-definitions
docker run --rm -u 10000:10000 -v /etc/localtime:/etc/localtime:ro -v /hd/mantax/qlever/wikidata:/index -w /index --entrypoint bash --name qlever.wikidata-latest.index-build adfreiburg/qlever -c "ulimit -Sn 1048576; bzcat -f wikidata-latest.prefix-definitions latest-lexemes.ttl.bz2 latest-all.ttl.bz2 | IndexBuilderMain -F ttl -f - -i wikidata-latest -s wikidata-latest.settings.json --stxxl-memory-gb 10 | tee wikidata-latest.index-log.txt"
nohup ./doindex &
Log
2023-05-15 20:38:48.787 - INFO: QLever IndexBuilder, compiled on Mon May 1 10:21:29 UTC 2023 using git hash 83f1e8 2023-05-15 20:38:48.787 - INFO: You specified the input format: TTL 2023-05-15 20:38:48.788 - INFO: You specified "locale = en_US" and "ignore-punctuation = 1" 2023-05-15 20:38:48.788 - INFO: You specified "num-triples-per-batch = 5,000,000", choose a lower value if the index builder runs out of memory 2023-05-15 20:38:48.788 - INFO: Integers that cannot be represented by QLever will throw an exception (this is the default behavior) 2023-05-15 20:38:48.788 - INFO: Processing input triples from /dev/stdin ... 2023-05-15 20:40:34.263 - INFO: Input triples processed: 100,000,000 ... 2023-05-16 02:44:35.983 - INFO: Input triples processed: 18,500,000,000 2023-05-16 02:45:56.492 - INFO: Done, total number of triples read: 18,572,955,199 [may contain duplicates] 2023-05-16 02:45:56.492 - INFO: Number of QLever-internal triples created: 11,318,825,076 [may contain duplicates] 2023-05-16 02:45:56.492 - INFO: Merging partial vocabularies in byte order (internal only) ... 2023-05-16 02:47:12.954 - INFO: Words merged: 100,000,000 ... 2023-05-16 03:01:05.155 - INFO: Words merged: 800,000,000 2023-05-16 03:01:41.669 - INFO: Number of words in internal vocabulary: 861,507,414 2023-05-16 03:01:41.669 - INFO: Building prefix tree from internal vocabulary ... 2023-05-16 03:02:00.943 - INFO: Words processed: 100,000,000 ... 2023-05-16 03:06:50.205 - INFO: Words processed: 800,000,000 2023-05-16 03:07:15.366 - INFO: Computing maximally compressing prefixes (greedy algorithm) ... 2023-05-16 03:19:52.577 - INFO: Reduction of size of internal vocabulary: 45% 2023-05-16 03:20:17.322 - INFO: Merging partial vocabularies in Unicode order (internal and external) ... 2023-05-16 03:23:39.976 - INFO: Words merged: 100,000,000 ... 2023-05-16 05:05:07.691 - INFO: Words merged: 3,300,000,000 2023-05-16 05:08:47.811 - INFO: Number of words in external vocabulary: 2,529,812,916 2023-05-16 05:08:47.811 - INFO: Removing temporary files ... 2023-05-16 05:08:58.512 - INFO: Converting external vocabulary to binary format ... 2023-05-16 05:29:57.049 - INFO: Converting triples from local IDs to global IDs ... 2023-05-16 05:30:04.051 - INFO: Triples converted: 100,000,000 ... 2023-05-16 06:43:46.833 - INFO: Triples converted: 29,800,000,000 2023-05-16 06:43:59.778 - INFO: Done, total number of triples converted: 29,891,780,275 2023-05-16 06:43:59.816 - INFO: Writing compressed vocabulary to disk ... 2023-05-16 06:48:30.007 - INFO: Creating a pair of index permutations ... 2023-05-16 09:35:45.922 - INFO: Statistics for PSO: #relations = 70,309, #blocks = 802,770, #triples = 24,908,374,000 2023-05-16 09:35:45.930 - INFO: Statistics for POS: #relations = 70,309, #blocks = 802,770, #triples = 24,908,374,000 2023-05-16 09:35:45.930 - INFO: Writing meta data for PSO and POS ... 2023-05-16 09:36:00.367 - INFO: Creating a pair of index permutations ... 2023-05-16 10:53:58.155 - INFO: Statistics for SPO: #relations = 2,953,527,247, #blocks = 533,058, #triples = 24,908,374,000 2023-05-16 10:53:58.157 - INFO: Statistics for SOP: #relations = 2,953,527,247, #blocks = 533,058, #triples = 24,908,374,000 2023-05-16 10:53:58.157 - INFO: Writing meta data for SPO and SOP ... 2023-05-16 10:54:07.257 - INFO: Number of distinct patterns: 8,156,126 2023-05-16 10:54:07.257 - INFO: Number of subjects with pattern: 1,953,415,853 [all] 2023-05-16 10:54:07.257 - INFO: Total number of distinct subject-predicate pairs: 10,724,039,637 2023-05-16 10:54:07.257 - INFO: Average number of predicates per subject: 5.5 2023-05-16 10:54:07.266 - INFO: Average number of subjects per predicate: 207,537 2023-05-16 10:54:24.867 - INFO: Creating a pair of index permutations ... 2023-05-16 12:14:01.504 - INFO: Statistics for OSP: #relations = 3,353,488,602, #blocks = 692,046, #triples = 24,908,374,000 2023-05-16 12:14:01.507 - INFO: Statistics for OPS: #relations = 3,353,488,602, #blocks = 692,046, #triples = 24,908,374,000 2023-05-16 12:14:01.507 - INFO: Writing meta data for OSP and OPS ... 2023-05-16 12:14:05.301 - INFO: Index build completed
start
nohup qlever start&
Executing "start":
docker run -d --restart unless-stopped -u 10000:10000 -it -v /etc/localtime:/etc/localtime:ro -v /hd/mantax/qlever/wikidata:/index -p 7001:7001 -w /index --entrypoint bash --name qlever.wikidata-latest adfreiburg/qlever -c "ServerMain -i wikidata-latest -j 8 -p 7001 -m 50 -c 30 -e 5 -k 100 -a \"wikidata-latest_1432218987\" > wikidata-latest.server-log.txt" > /dev/null
Starting the QLever server in the background and waiting until it's ready (Ctrl+C will not kill it) ...
2023-05-17 08:28:38.072 - INFO: QLever Server, compiled on Mon May 1 10:21:29 UTC 2023 using git hash 83f1e8
2023-05-17 08:28:38.091 - INFO: Initializing server ...
2023-05-17 08:28:38.094 - INFO: The git hash used to build this index was 83f1e8
2023-05-17 08:28:38.095 - INFO: Reading vocabulary from file wikidata-latest.vocabulary.internal ...
2023-05-17 08:29:05.872 - INFO: Done, number of words: 861,507,415
2023-05-17 08:29:05.889 - INFO: Number of words in external vocabulary: 2,529,812,915
2023-05-17 08:29:06.095 - INFO: Registered PSO permutation: #relations = 70,309, #blocks = 802,770, #triples = 24,908,374,000
2023-05-17 08:29:06.323 - INFO: Registered POS permutation: #relations = 70,309, #blocks = 802,770, #triples = 24,908,374,000
2023-05-17 08:29:06.503 - INFO: Registered OPS permutation: #relations = 3,353,488,602, #blocks = 692,046, #triples = 24,908,374,000
2023-05-17 08:29:06.683 - INFO: Registered OSP permutation: #relations = 3,353,488,602, #blocks = 692,046, #triples = 24,908,374,000
2023-05-17 08:29:06.820 - INFO: Registered SPO permutation: #relations = 2,953,527,247, #blocks = 533,058, #triples = 24,908,374,000
2023-05-17 08:29:06.958 - INFO: Registered SOP permutation: #relations = 2,953,527,247, #blocks = 533,058, #triples = 24,908,374,000
2023-05-17 08:29:06.958 - INFO: Reading patterns from file wikidata-latest.index.patterns ...
2023-05-17 08:29:30.494 - INFO: Sorting random result tables to estimate the sorting performance of this machine ...
2023-05-17 08:29:34.189 - INFO: Access token for restricted API calls is "wikidata-latest_1432218987"
2023-05-17 08:29:34.189 - INFO: The server is ready, listening for requests on port 7001 ...
2023-05-17 08:29:35.004 - INFO:
2023-05-17 08:29:35.004 - INFO: Request received via GET, no content type specified
2023-05-17 08:29:35.005 - INFO: Alive check with message "from the qlever script"
2023-05-17 08:29:35.022 - INFO:
2023-05-17 08:29:35.022 - INFO: Request received via GET, no content type specified
2023-05-17 08:29:35.022 - INFO: Setting index description to: "Full Wikidata dump (latest-all.ttl.bz2 from 11.05.2023, latest-lexemes.ttl.bz2 from 13.05.2023)"
QLever UI
git clone https://github.com/ad-freiburg/qlever-ui.git qlever-ui
Cloning into 'qlever-ui'...
remote: Enumerating objects: 4593, done.
remote: Counting objects: 100% (731/731), done.
remote: Compressing objects: 100% (180/180), done.
remote: Total 4593 (delta 539), reused 703 (delta 537), pack-reused 3862
Receiving objects: 100% (4593/4593), 6.23 MiB | 2.87 MiB/s, done.
Resolving deltas: 100% (2332/2332), done.
wf@wikidata:/hd/mantax/qlever$ cd qlever-ui
wf@wikidata:/hd/mantax/qlever/qlever-ui$ mv qlever/settings_secret_template.py qlever/settings_secret.py
wf@wikidata:/hd/mantax/qlever/qlever-ui$ vi qlever/settings_secret.py
wf@wikidata:/hd/mantax/qlever/qlever-ui$ docker build -t qleverui .
[+] Building 10.3s (6/10)
=> [internal] load metadata for docker.io/library/python:3.10.2-alpin 1.7s
=> [1/6] FROM docker.io/library/python:3.10.2-alpine3.15@sha256:4eff1 2.2s
=> => resolve docker.io/library/python:3.10.2-alpine3.15@sha256:4eff1 0.0s
=> => sha256:4eff19dfce481c125674c902b24aa6667b9bc166 1.65kB / 1.65kB 0.0s
=> => sha256:b716677823ca2fc111863461d0ca76323cdeca83 1.37kB / 1.37kB 0.0s
=> => sha256:69fba17b9bae588a6fd69f3c3804ea61d32873ca 7.06kB / 7.06kB 0.0s
=> => sha256:07a400e93df3fcc09e5f874878c049b15515 678.30kB / 678.30kB 0.2s
=> => sha256:64052ee245ef4746c5150927b6adea7276a968 13.18MB / 13.18MB 0.6s
=> => sha256:a44d093ad4a590eade1fe51a698359a267edfc74cad1 234B / 234B 0.3s
=> => extracting sha256:07a400e93df3fcc09e5f874878c049b15515236f55fbf 0.5s
=> => sha256:0381087ee06555e8aa46ebb251894303c728f812 2.34MB / 2.34MB 0.6s
=> => extracting sha256:64052ee245ef4746c5150927b6adea7276a96820a8b5a 0.8s
=> => extracting sha256:a44d093ad4a590eade1fe51a698359a267edfc74cad16 0.0s
=> => extracting sha256:0381087ee06555e8aa46ebb251894303c728f8124e29d 0.3s
=> [internal] load build context 0.2s
=> => transferring context: 8.46MB 0.2s
=> [2/6] ADD requirements.txt /app/requirements.txt 1.1s
=> [3/6] RUN set -ex && python -m venv /env && /env/bin/pip i 5.2s
=> => # + python -m venv /env
=> => # + /env/bin/pip install --upgrade pip
=> => # Requirement already satisfied: pip in /env/lib/python3.10/site-pack
=> => # ages (21.2.4)
=> [4/6] RUN set -ex && runDeps="$(scanelf --needed --nobanner -- 2.1s
=> [5/6] COPY . /app 0.1s
=> [6/6] WORKDIR /app 0.0s
=> exporting to image 2.3s
=> => exporting layers 2.3s
=> => writing image sha256:91f1e48d218a3941edf61718bad84447eed1f19730 0.0s
=> => naming to docker.io/library/qleverui 0.0s
Setting up database
docker run -it --rm \
-v "$(pwd)/db:/app/db" \
--entrypoint "bash" qleverui
bash-5.1# python manage.py migrate
Operations to perform:
Apply all migrations: admin, auth, backend, contenttypes, sessions
Running migrations:
Applying auth.0012_alter_user_first_name_max_length... OK
create super user
python manage.py createsuperuser
Username (leave blank to use 'root'): wf
Email address: wf@bitplan.com
Password:
Password (again):
Superuser created successfully.
run service
docker run -d --restart=unless-stopped -p 7000:7000 \
-v "$(pwd)/db:/app/db" \
--name qleverui \
qleverui